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ABSTRACT 


This  report  reflects  current  developments  in 
on-line  information  retrieval  systems  and  suggests 
goals  for  future  developments  of  these  systems.  The 
review  and  comparison  of  the  three  major  on-line 
information  retrieval  systems  discussed  in  the  paper 
reflect  current  achievements  in  working  on-line 
information  retrieval  systems.  A  final  section  is 
devoted  to  suggestions  for  exploiting  the  ooter.tial 
of  on-line  remote  access  information  retrieval  and 
display  systems. 


ADMINISTRATIVE  INFORMATION 

This  work  was  accomplished  in  the  Information  Retrieval 
Division  of  the  Department  of  Apolied  Mathematics  under  Project 
SR00308,  Task  No,  SR0Q3Q801,  T.de:  Remote  Use  of  Computers  for 
Naval  Information  Systems. 
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I.  INTRODUCTION 


This  report  presents  a  review  and  evaluation  of  three  remote 
access  on-line  information  retrieval  systems  and  some  ideas  on 
what  the  capabilities  of  an  ideal  on-line  information  retrieval 
system  should  be.  TI.'*  three  systems  reviewed  are  the  DDC  Remote 
On-Line  Retrieval  System,  the  National  Aeronautics  and  Space 
Administration  RECON  System,  and  the  Technical  Information  Program 
(TIP)  of  the  Massachusetts  Institute  of  Technology. 

The  DDC  system  is  examined  in  detail  for  two  reasons.  It  is 
the  newest  of  the  three  systems  and  reflects  many  of  the  latest 
advances  in  the  utilization  of  on-line  capabilities  for 
information  retrieval.  It  is  currently  in  use  at  the  Naval  Ship 
R&D  Center  (NSRDC) ,  and  the  author  has  had  direct  personal 
experience  in  its  use  and  development. 

Each  of  the  three  systems  is  reviewed  on  the  basis  of  its 
operation  during  the  last  quarter  of  1969.  This  paper  does  not 
reflect  changes  in  the  systems  which  have  occurred  since  then. 
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II,  THE  DDC  REMOTE  ACCESS  ON-LINE  INFORMATION  RETRIEVAL  SYSTEM1 
A.  INTRODUCTION 

The  Defense  Documentation  Center  (DDC)  Remote  On-Line  Retrieval 
System  is  an  experimental  program  which  has  the  capability  of  querying 
the  RD&T  Work  Unit  Information  System  (WUIS)  data  bank  from  remote 
terminals.  Its  primary  purposes  are  to  provide  direct  access  by  the 
user  to  the  DDC  1498  WUIC  data  bank,  and  to  determine  whether  the 
relevancy  or  usefulness  of  stored  information  can  be  increased  through 
a  significant  reduction  in  delivery  time  as  compared  to  the  batch 
mode  of  operation. 

The  experimental  system  consists  of  seven  remote  terminals  which 
are  used  for  testing  and  evaluating  the  concepts  and  methods  applied. 
The  seven  terminals  are  deployed  as  follows: 

Three  at  DDC  (Cameron  Station,  Va.) 

One  at  DDR&E  (Pentagon) 

One  at  Army  (Arlington,  Va.  -  Estimated  operational  date, 

July  1971) 

One  at  Navy  (Naval  Ship  Research  and  Development  Center 
(NSRDC) ,  Carderock,  Md.) 

One  at  Air  Force  (Andrews  Air  Force  Base,  Md.) 

One  at  NSA  (Ft.  George  G.  Meade,  Md.) 

Each  remote  terminal  is  equipped  with  a  Uniscope  cathode  ray 
tube  (CRT),  a  remote  buffer,  and  a  remote  printer  which  handle,  on 
a  time-sharing  basis,  all  communications  with  the  WUIS  data  bank. 
Circuits  between  DDC  and  remote  terminals  are  secured  through  the 
use  of  Telecommunications  Security  (TSEC)  equipment  to  protect 
transmission  of  classified  material, 

1.  References  are  listed  on  page  55 
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the  DDC  Remote  On-Line  Terminal  equipment  and  Telecommunications 
Security  equipment  were  installed  at  NSRDC  in  July,  1969.  Instruc¬ 
tion  in  the  use  of  terminal  equipment  was  provided  by  DDC  and  the 
terminal  has  been  in  operation  ever  since. 

R.  EQUIPMENT  DESCRIPTION 

1.  Uniscope  300  Visual  Communications  Terminal 

The  Single  Station  Uniscope  is  contained  in  one  case,  and 
consists  basically  of  a  display  screen,  memory,  control,  input/output 
section,  and  character  generator. 

Display  Screen.  The  display  screen  is  the  face  of  a  cathode 
ray  tube  (CRT)  with  a  viewing  surface  10  inches  wide  and  5  inches 
high.  A  display  format  of  64  characters  per  line  on  16  lines  per 
display  is  provided,  permitting  a  total  of  1,024  characters  to  be 
displayed  and/or  to  be  held  in  memory.  Spacing  between  characters 
is  consistent  from  one  end  of  the  screen  to  the  other,  and  the  size 
and  shape  of  the  characters  do  not  change  in  relation  to  their 
position  on  the  screen.  The  character  style  maximizes  legibility 
and  readability.  Each  character  Is  .150  x  ,113  Inches  and  is 
readable  from  a  distance  of  seven  feet.  Character  brightness  may 
be  varied  by  the  operator  from  70%  of  brightness  to  full  brightness. 
Since  each  character  is  repainted  on  the  display  surface  60  times 
each  second,  no  flicker  or  jitter  is  perceptible  to  the  viewer. 

Operator  Controls.  The  operator  controls  consist  of  an 
alphanumeric  typewriter  keyboard,  cursor  control  keys,  editing  keys, 
indicators,  function  keys,  end  display  controls. 
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Typewriter  Keyboard,.  The  typewriter  keyboard  very  closely 
resembles  the  standard  electric  typewriter  Its  purpose  is  to  print 
messages  destined  for  the  computer  As  a  key  is  depressed,  its 
character  goes  simultaneously  to  the  display  memory  and  to  the  display 
screeno  It  remains  displayed  until  the  operator  edits  and  verifies 
the  data  before  transmitting  it  to  the  computer 

Cursor o  The  cursor  is  a  unique  character  what  is  displayed 
on  the  CRT  at  all  times  The  cursor  indicates  the  location  at  which 
the  next  data  character  will  be  displayed  It  also  indicates  the 
starting  position  from  which  data  will  be  transmitted  to  the  computer 
Whenever  the  cursor  is  positioned  on  a  display able  character,  both 
will  blink  automatically  The  blinking  prevents  the  operator  from 
losing  track  of  the  cursor  when  it  is  superimposed  over  a  character 
The  cursor  advances  one  space  for  each  character  that  is  typed  and 
can  be  positioned  by  the  cursor  control  keys.  The  cursor  control 
keys  are  nondestructive  and  do  not  effect  the  information  in  display 
memory,, 

As  the  cursor  moves  to  within  eight  positions  o*  the  end 
of  any  line,  an  audible  alarm  (which  can  be  turned  off)  will  momentarily 
sound  and  an  indicator  light  which  says  "End  of  Line"  will  be  lit 
The  indicator  will  remain  lit  as  long  as  the  cursor  occupies  one  of 
the  last  eight  character  positions,  Additionally,  as  the  cursor 
enters  the  bottom  line  of  the  display,  the  alarm  will  sound  and  the 
"Last  Line"  indicator  will  be  illuminated 
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Cursor  Control  Keys.  The  eight  cursor  control  keys  listed 
below  are  used  when  composing  or  editing  displayed  messages. 

Scan  Forward  -  This  positioning  key  moves  the  cursor 
forward  one  space  at  a  time,  or  ten  spaces  per 
second  when  held  down. 

Scan  Backward  -  This  positioning  key  moves  the  cursor 
backward  one  space  at  a  time,  or  ten  spaces  per 
second  if  held  down. 

Scan  Up  -  This  positioning  key  moves  the  cursor  up  one 
line  at  a  time,  or  ten  lines  per  second  if  held 
down. 

Scan  Down  -  This  positioning  key  moves  the  cursor  down 
one  line  at  a  time,  or  ten  lines  per  second  if  held 
down. 

Note:  The  repetition  rate  of  ten  spaces  per 
second  is  actually  adjustable  from  five  to 
twenty-five  spaces  per  second  and  may  be 
preset  anywhere  in  this  range. 

Cursor  to  Home  -  The  key  repositions  the  cursor  to  the 
first  character  position  on  ths  display. 

Return  -  The  return  key  ia  similar  to  the  carriage  return 
on  a  typewriter,  and  positions  the  cursor  to  the  flttst 
position  of  the  next  line. 
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3 pact  -  Tit*  space  bar  is  la  tha  position  normally  occupied 
by  tha  space  bar  of  a  standard  typewriter  kayboard- 
itnA  movbs;’ the  Cursor  forward-one  speed  for:  daeh 
depression. 

Tab  *  This  is  a  special  cursor  positioning  key  that  moves 
tha  cursor  forward  until  a  special  tab  stop  character 
is  detected  in  tha  display  memory.  If  a  tab  stop 
character  is  detected!  the  cursor  will  stop  one  char¬ 
acter  beyond  it  If  no  tab  character  is  found,  the 
cursor  will  stop  at  the  end  of  the  display. 

Back  Space  -  The  back  space  ksy  is  similar  to  the  back 
space  key  on  a  standard  typewriter  and  moves  the 
cursor  backwards  one  space  for  each  depression. 

Edit ins  Keys.  Five  editing  keys  are  used  to  correct  or  change 
data  that  has  been  input  at  the  keyboard  or  received  from  the  computer. 
The  use  of  these  keys  is  straightforward. 

Character  erase 
Erase  to  end  of  line 
Brass  to  end  of  display 
Insert 
Delete 

Function  keys.  Forty  special  function  keys  are  available. 

Five  are  located  above  the  numeric  row  of  the  typewriter  keyboard 
The  rest  are  located  to  the  right  of  the  typewriter  keyboard.  Only 
five  of  the  apedal  function  keys  are  currently  programed  for  use 
within  tha  WU1S  Retrieval  System. 
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Data  Control  Kays.  Thar*  art  two  data  control  kayo,  tha 


transmit  key,  and  tha  message  waiting  key. 

The  transmit  kay  causas  data  to  ba  transferred  to  tha 
computer.  When  this  key  is  pressed,  all  data  (64-character  maximum, 
including  spaces)  on  the  same  line  and  to  tha  left  of  the  curaor 
will  be  transmitted.  The  keyboard  ia  locked  out  from  further  data 
entry  until  the  message  transmitted  has  been  accepted  by  the  computer. 

The  message  waiting  kay  is  used  in  conjunction  with 
unsolicited  messages. 

Indicator  Lights.  There  are  six  indicators  located  above 
the  keyboard.  Their  labels  and  functions  are  as  follows: 

Last  Line  -  Lights  when  the  cursor  is  positioned 
anywhere  in  tha  last  line  of  the  display. 

End  Line  -  Lights  when  the  cursor  is  in  any  of  the 
last  eight  character  positions  of  any  j>.ine. 

fault  -  Lights  whenever  a  parity  error  is  detected  in 
the  message  thet  is  being  received  from  the  computer. 

Hl-Teap  -  Lights  to  warn  the  operator  that  the  internal 
temperature  ia  exceeding  the  normal  limit. 

Message  Welting  -  Lights  whenever  an  unsolicited  message 
is  to  be  received  from  the  computer. 

Wait  -  Lights  during  the  time  a  message  is  being  trans¬ 
mitted  or  received. 
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Audible  Alarm.  An  audible  alarm  will  sound  to  alert  the 


operator  to  three  possible  conditions..  A  single  beep  is  sounded 
when  the  cursor  moves  into  the  57th  character  position  of  any  line 
end  also  when  the  cursor  moves  into  the  first  character  position 
of  the  last  line.  The  alarm  also  sounds  intermittently  whenever 
an  unsolicited  computer  message  is  waiting  The  alarm  is  turned 
off  when  the  message  waiting  switch  is  depressed  It  also  may  be 
adjusted  so  that  it  does  not  sound  at  any  time 

Display  Controls  There  are  four  display  controls  in  the 
upper  right  hand  portion  of  the  keyboard  These  controls  are 
described  as  follows; 

On-Off  -  The  on-off  switch  requires  a  passkey  to 

operate  It  applies  power  to  the  Unlscope  terminal 
and  puts  it  in  an  operating  state 

Focus  -  This  control  is  used  to  focus  the  characters 
on  the  display  screen 

Louder  -  This  control  varies  the  volume  of  the  audible 
alarm 

Brighter  -  This  control  varies  the  brightness 
of  the  characters  being  displayed 
Display  Sneed.  The  display  generates  five  lines  p«r  second  * 
or  320  characters  per  eecond  et  the  rets  of  3200  words  per  minute 
The  transmiesion  rate  is  dependent  on  the  transmission  rate  of  the 
modem  equipment. 
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Display  Memory.  The  display  has  a  1,024  character  mem  ry 
which  transmits  256  characters  at  a  time  to  the  buffer  for  printing* 

2.  The  Remote  Buffer 

The  remote  buffer  acts  as  a  control  unit  for  the  pagewrlter. 

It  provides  the  interface  between  the  display  scope  and  the  pagewrlter 
and  holds  display  data,  transmitted  from  the  display  for  printing, 
in  storage  memory  for  the  pagewrlter.  Data  from  the  display  scope 
memory  is  released  to  the  buffer  storage  memory*  at  the  rate  of  256 
characters  at  a  time 

3.  The  Pagewrlter 

The  operation  of  the  pagewrlter  is  similar  to  that  of  a 
teleprinter  in  that  it  produces  hard-copy  of  incoming  massages ..  The 
pagewrlter  is  not  program  controlled  and  a  release  key  on  the  display 
keyboard  must  be  depressed  to  activate  it  for  a  hard-copy  of  the 
screen  display .  The  pagewriter  is  mounted  on  a  free  standing  pedestal 
which  houses  the  pagewrlter' 8  power  supply. 

Printing  Speed.  The  pagewrlter  operates  at  a  speed  of  1500 
characters  per  minute  with  an  average  of  250  words  per  minute  at  the 
rate  of  50  lines  per  minute. 

Operator  Controls  and  Indicators.  Operator  controls  and 
Indicators  consist  of  six  push-button  switch/indicatcrs  and  a  mechanical 
adjustment  control.  Controls  are  described  as  follows: 

Off  Switch /Indies tor  -  Depressing  this  twitch  Indicator 
turns  DC  power  off  and  lights  the  indi  itor. 
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On  Switch/ Indicator  -  Depressing  this  switch  turns  DC 
power  on  and  lights  the  indicator 
Ready  Switch/ Indicator  -  Depressing  this  switch  generates 
a  general  clear  and  ready  signal  to  the  remote  buffer 
and  lights  the  indicator 

Error  Indicator/Fora  Feed  Switch  -  The  indicator  lights 
to  indicate  a  data  error,  right  margin  error,  or 
missing  "remote"  data  carrier  Depressing  the 
switch  feeds  paper  until  the  switch  is  released 
Change  Ribbon  Indicator/ Forms  Out  Indicator/Switch  - 
The  Change  Ribbon  indicator  lights  when  the  ribbon 
reversal  counter  equals  zero,  indicating  a  ribbon 
change  condition.  The  Forms  Out  indicator  lights 
when  the  unit  is  out  of  paper  Depressing  the 
swttch  when  the  indicator  is  lit  clears  the  audible 
alarm 

Text  Switch/ Indicator  -  Depressing  this  switch  prints 
test  character*  (8  s)  and  lights  the  indicator 
Print  Phasing  Thuabvheal  -  This  adjustment  positions 
ribbon  t*ri  optimum  printing  legibility 
C.  SEARCH  CAPABILITIES 
1  Data  Files 

The  WUIS  data  base  consists  of  a  Summary  Data  Fils  (Direct 
file)  and  an  Indax  Fils  (Invertad  File)  which  art  monitored  period* 
lcally  by  of (•line  file  maintenance  programs  to  insura  current, 
accurate  data  and  file  compatibility 
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The  Summary  Data  File  (Direct  File)  is  the  master  file 
and  contains  all  the  data  fields  pertinent  to  each  WUIS  summary. 

The  file  is  organized  by  accession  number;  all  the  information  for 
a  given  summary  is  located  directly  under  the  accession  number  of 
the  summary.  All  Direct  File  data  available  to  the  user  are  described 
in  the  "RD&T  Work  Unit  Information  System  Data  Input  Manual," 

DSAM  4185.5. 

b.  Index  File  (Inverted  File/I. F.) 

The  Index  File  (Inverted  File)  is  a  retrieval  file  which 
consists  of  selected  data  fields  extracted  from  the  Summary  Data  File 
(Direct  File).  The  data  fields  available  for  searching  by  means  of 
the  Index  File  are  as  follows: 

1498  Field  Nos.  Field  Description  Inverted  File  Roles 

(DFID) _ 


1 

Agency  Digraph 

04 

4 

Kind  of  Summary 

05 

5 

Security  of  Summary 

58 

10a 

Primary  Program  Element  Ho. 

06 

Primary  F.E.  (1st  2  characters) 

51 

Primary  P.E.  (1st  3  characters) 

52 

Primary  Project  No. 

07 

Primary  Project  6  task  No. 

08 

Primary  Project*  Teak*  &  Work 
Unit  No. 

09 

10b 

First  Contributing  P.E. 

10 

First  Contributing  P.E.  (1st 

2  characters) 

53 

12 


First  Contributing  ?  E  (1st 


3  characters) 

5A 

First  Contributing  Project  No 

11 

First  Contributing  Project  6 
Task  No 

12 

10c 

Second  Contributing  P  E 

13 

Second  Contributing  P  E 
(1st  2  characters) 

55 

Second  Contributing  P  E 
(1st  3  characters) 

56 

Second  Contributing  Project  No 

14 

t 

Second  Contributing  Project  6 
Task  No 

15 

12 

S^ientific/Technical  Area 
(COSATI  Code) 

20 

15 

Funding  App"ries  (all) 

21 

16 

Performance  Method 

23 

l?b 

Contract/Grant  Number 

25 

17c 

Contract /Grant  Type 

2A 

17d 

Contract /Grant  Partial  Code 

57 

17e 

Contract/Grant  Kind  of  Award 

22 

19c 

Responsible  Individual 

?8 

19d 

State/Foreign  Country  Code 

27 

State  plus  Congressional 
District  Codew 

A1 

DOD  Organization  Source  Code  - 
Six  diglv  identification 
code  of  all  VUIS  source 

n  aes 

26 

20c 

Principal  Investigator 

31 

20* 

Principal  lnvcs; igatot  Social 
Security  No 

16 

20f 

Aaeoclate  Inveatigator  (First) 

32 

13 


20g  Associate  Investigator  (Second)  33 

State /Foreign  Country  Code  30 

State  plus  Congressional 
District  Code  42 

Performing  Organization  Type 

Code  34 

Performing  Organization  Source 

Code  29 

/ 

22  Keywords 

37  Descriptors  -  DDC  assigned 

descriptive  teims 

38  Identifiers  -  DDC  assigned  terms 

denoting  specific  equipments, 
projects,  etc. 

The  Inverted  File  is  organized  by  type  of  data  field  and  in  accession 
number  sequence  within  each  data  field, 
b.  Special  File 

The  Special  File  is  a  temporary  file  created  by  the 
user  for  temporary  storage  of  selected • accession  numbers  resulting 
from  searches.  Its  purpose  is  to  allow  the  merger  of  the  results 
of  several  separate  search  strategies  so  that  their  output  may  be 
combined  into  one  result  for  batch  processing.  Duplicate  answers 
which  result  from  combined  search  results  are  deleted  as  part  of 
batch  processing. 

2.  Inverted  File  Search  Capabilities 

The  Inverted  Flic  is  made  available  for  aearching  by  means 
of  Function  Code  "S".  The  following  search  qualification  options 
are  available  for  use  on  the  Inverted  Filet 
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a„  Hierarchy  (Generic  Help) 

The  hierarchy  or  generic  help  search  qualification  option 
is  based  on  thesaurus  structure  A  thesaurus  is  primarily  a  dictionary 
of  synonyms,  and  serves  as  a  guide  to  the  selection  (and  spelling)  of 
a  term  which,  among  a  group  of  synonymous  terms,  has  been  authorized 
as  the  descriptor  for  indexing  the  concept  involved.  In  addition 
to  guiding  the  user  to  the  authorized  descriptor  among  a  group  of 
synonyms,  the  thesaurus  also  lists  under  each  authorized  descriptor 
other  authorized  descriptors  which  are  generically  or  closely  related 
to  it.  The  purpose  of  listing  generic  and  related  terms  under  each 
authorized  descriptor  is  to  guide  the  thesaurus  user  to  the  most 
specific  authorized  descriptor  available  for  a  given  concept,, 

The  designation  of  generic  relationships  is  based  on  the 
hierarchical  structuring  of  descriptors  within  families  and  indicates 
a  broader  or  narrower  relationship  between  descriptors.  A  broader 
term  is  a  class  term;  for  example,  iron  alloys  is  a  broader  term 
than  steels.  A  narrower  term  is  the  reciprocal  of  a  broader  term 
and  refers  to  a  term  that  is  a  member  of  a  class.  For  example,  the 
terms  'steels',  'gray  iron’,  and  'mottled  iron”,  are  narrower  terms 
of  the  class  or  broader  term,  iron  alloys.  Iron  alloys  is,  in  turn,  a 
member  of  the  class  'metals',  and  is  listed  as  a  narrower  member 
or  term  of  the  class  'metals'. 

This  highly  organized  structuring  of  generic  relationships 
between  authorized  thesaurus  descriptors  results  in  a  powerful  and 
efficient  retrieval  option  for  expanding  the  scope  of  a  search, 
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Combining  the  symbol  for  the  hierarchical  search  option  with  an 
authorized  thesaurus  descriptor  will  automatically  expand  the  search 
to  Include  not  only  all  accessions  posted  to  the  descriptor  but  also 
all  accessions  posted  to  all  descriptors  designated  by  the  thesaurus 
as  generlcally  narrower  to  the  original  descriptor. 

Since  the  DDC  Thesaurus  is  not  available  as  an  on-line 
file,  the  narrower  terms  themselves  are  not  actually  used  to  retrieve, 
Instead,  a  distinct  descriptor  is  sought,  i.e.,  $  plus  descriptor. 

This  is  a  separate  descriptor  distinguished  from  the  normal  descriptor 
by  the  preceding  ($) .  ($)  plus  descriptor  postings  for  broader  class 

terms  are  automatically  generated  whenever  a  narrower  class  term  is 
posted.  Thus  the  combination  of  the  hierarchical  search  option  quali¬ 
fier,  ($)  plus  descriptor,  is  actually  a  request  to  retrieve  all 
accessions  posted  to  the  term  by  indexers  and  all  accessions  posted 
automatically  to  the  descriptor  as  a  result  of  its  narrower  term 
posting, 

b.  Weighting  Option 

When  a  document  is  indexed t  the  descriptors  representing 
the  main  subjects  are  indicated  and  differentiated  from  the  set  of 
associated  descriptors  by  pracedlng  the  main  subject  descriptors  with 
an  asterisk  (*)  or  weighting  symbol. 

Conversely,  the  search  analyst  may  wish  to  increase  the 
relevance  of  hla  answera  by  limiting  his  search  to  main  subject  entries 
by  means  of  the  weighting  option.  The  weighting  option  is  initiated 
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by  preceding  a  search  term  with  an  asterisk  (*)  weighting  symbol  and 
is  effective  only  when  used  in  conjunction  with  authorized  thesaurus 
descriptors. 

Co  Masking  Option 

Weighting  and  hierarchical  options  are  limited  to  Inverted 
File  searches  involving  authorized  descriptors.  The  masking  or 
prefix  option  is  not  limited  to- the  thesaurus  vocabulary  and  allows 
the  use  of  shortened  terms.  A  search  using  the  masking  option  will 
match  the  field  value  of  its  entry  against  every  field  value  on  the 
Inverted  File  except  dollar  amounts  and  dates.  To  avoid  needless 
and  time  consuming  searches  the-  compnter  checks  the  length  of  the 
field  values  associated  with  a  masking  option  operator.  Any  masked 
field  value  of  four  characters  or  leas  generates  a  scope  display 
message  cautioning  the  user  and  halts  the  search  initiation  until 
an  override  operator  is  submitted  by  the  user, 
d.  Term  Role  (Field  Code)  Option 

This  option  is  used  to  search  known  data  fields  on  the 
Inverted  File,  other  than  descriptors.  Identifiers,  or  keywords. 

The  date  fields  on  the  Inverted  File  are  identified  by  two-digit 
field  codes  (Data  Field  Identifiers)  t,.  a"  codes.  An  Inverted 
File  date  field  search  is  implemented  by  preceding  the  field  value 
to  be  searched  with  a  ?  symbol  (operator)  and  the  appropriate  two- 
digit  ,lrole"  code  (DFID). 
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e.  Boolean  Connectives 


Once  the  search  terms and  the  options  or  qualifiers 
to  be  used  in  conlvjpet'fon  with  thum  have  been  selected,  they  are 


assembled  into  a  search  pattern  or  logical  sequence  by  means  of 
the  normal  Boolean  connectives  (AND,  OR,  NOT), 

Boolean  "OR”.  The  "OR"  connector  is  the  most  basic  of 
Boolean  connectors  and  is  so  frequently  used  that  it  is  presumed 
to  be  present  whenever  two  or  more  search  terms  are  listed  without 
an  intervening  Boolean  connector*  An  "OR"  logic  pattern  will 
retrieve  the  sum  of  all  hits  which  satisfy  any  one  of  a  set  of 
alternative  search  terms.  Duplicate  hits  are  merged  in  the  rearch 


result  statistics. 


The  "AND"  connector  is  used  for  coordi¬ 


nated  search  strategies  and  requires  the  satisfaciton  of  at  least 
two  separate  conditions  before  an  answer  is  accepted  as  a  hit.  The 
number  of  conditions  which  must  be  met  is  always  one  more  than  the 
number  of  "AND"  connectors  used. 

Boolean  "NOT".  A  Boolean  "NOT"  connector  is  used  for 
exclusion.  Answers  which  include  any  of  the  conditions  listed  under 
the  "NOT"  connector  will  be  excluded  from  the  final  search  result. 
The  "NOT"  connector  Is  usually  used  as  a  final  condition  of  an  "OR", 
"AND",  or  "AND/OR"  search  logic  pattern. 


Combination  Boolean  "AND/OR".  The  use  of  an  "AND" 


connector  between  sets  of  alternative  "OR"  search  terms  combines 
the  advantages  of  both  "AND”  and  "OR"  logic.  Its  use  permiLs 
considerable  expansion  of  search  coverage,  particularly  when 
hierarchical  «d/ar  masking  qualifiers  are  used  with  the  search 
terms  involved. 

3.  S Selected  Direct  File  Search  Capability 

The  search  analyst  may  wish  to  further  refine  the  results 
of  an  Inverted  File  search  by  qualifications  on  data  fields  not 
available  for  searching  on  the  Inverted  File.  The  "Q„"  function 
code  opens  the  Direct  File  data  base  of  documents  selected  by  a 
prior  Inverted  File  search  and  permits  them  to  be  searched  by  means 
of  comparison  values. 

Direct  File  data  fields  may  be  qualified  by  the  following 
comparison  value  symbols ; 

EQ  ■  Equal 

NE  *  Not  equal 

LE  Less  than  or  equal 

GE  ■  Greater  than  or  equal 

GT  -  Greater  than 

The  search  analyst  may  further  qualify  prior  Inverted  File 
search  results  by  comparison  value  searches  on  Direct  File  data 
fields  such  as  dates  or  funding.  The  dociasents  which  do  not  meet  the 
comparison  values  used  are  eliminated  from  the  search.  A  final  search 
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statistic  is  displayed  and  the  accessions  which  meet  the  qualification 
are  available  for  further  manipulation  by  other  system  capabilities. 

Boolean  connectors  are  available  for  use  in  conjunotion  with 
Direct  File  comparison  values..  Direct  File  comparison  value  searches 
must  be  formatted  according  to  a  format  formula.  Error  messages  are 
displayed  if  either  a  comparison  value  symbol  or  a  Direct  File  data 
field  identifier  code  is  illegal. 

D.  TOTAL  SYSTEM  CAPABILITIES 

In  addition  to  its  search  capabilities  the  DDC  Remote  On-Line 
Retrieval  System  has  many  other  system  capabilities  which  can  be 
called  into  use  by  their  assigned  function  codes  to  act  upon  search 
results  or  to  aid  the  user  in  formulating  his  search  strategy  or 
search  display. 

The  total  list  of  system  capabilities  may  be  divided  into 
three  types:  action  codes,  auxiliary  codes,  and  service  codes. 

1.  Action  Cedes 

These  codes  identify  system  capabilities  which  initiate  and 
produce  a  user-sought  response.  The  S.,  Q. ,  and  7.  function  codes 
belong  in  this  category  as  well  as  all  action  requests  for  search 
result  display,  sorting,  and  transfer  to  off  line  printing  and 
sorting. 

Each  file,  Inverted,  Direct,  and  Special,  has  its  own  set 
of  display,  sort,  and  batch  transfer  action  codes.  All  three  sets 
of  file  aetton  codes  which  set  upon  search  results  have  the  same 
functions  and  capabilities,  but  are  file  dependent  in  thalr  imple¬ 
mentation. 
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Display  Act ions o  Action  codes  which  redisplay  search  questions 


?nd  search  statistics  can  be  implemented  only  in  conjunction  with 
inverted  or  Direct  File  searches  The  Special  File  is  a  storage  file 
of  selected  search  results  and  does  not  need  this  type  of  redisplay 
All  other  search  result  action  codes  may  be  implemented  on  any  of  the 
three  files. 

Common  to  all  three  files  is  the  ability  to  display  either 
search  result  accession  numbers  or  the  full  display  of  each  search 
result  work  unit.  Search  result  work  unit  displays  may  be  formatted 
according  to  standard  display  patterns  or  tailored  to  display  any 
field  in  ary  format  desired  Answers  may  be  viewed  on  the  scope 
either  in  a  continuous  mode  with  no  pauses  between  items  displayed 
or  in  a  non- continuous  single  frame  mode  with  page  changes  on  request 
Accession  sequence  may  be  reqr-'sted  in  either  ascending  or  descending 
order.  Forward  or  backward  browsing  of  work  unit  displays  is  avail¬ 
able  in  a  non-continuous  mode  and  includes  a  skip  value  option  which 
may  be  set  from  a  value  of  2  to  9999  as  determined  by  the  user 
Display  formats  and  browsing  modes  may  be  changed  at  will  between 
item  displays. 

Snjrfting  Actions  Sorting  Actions  are  limited  and  dependent 
on  whether  the  items  to  be  sorted  are  for  scops  display  and  terminal 
printing,  or  far  off-line  batch  processing  A  maximum  of  thr***  jort 
fields  may  be  specified  for  terminal  display  or  printing  The  maximum 
sort  for  off-line  batch  processing  is  four  sort  fields 
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Off-Line  Eatch  Printing  of  Saar ch  Results.  Eighteen  printing 
formats  are  available  for  off-line  batch  printing  requests  from  terminal 
users.  A  maximum  of  six  printouts  per  search  result  is  available. 

The  kinds  of  printouts  may  be  any  combination  of  copies  per  format  or 
number  of  formats  not  exceeding  a  total  of  six.  Sorting  of  off-line 
batch  printing  is  limited  to  four  fields.  Accession  number  sequence 
is  always  in  descending  order. 

Display  of  a  Known  Accession  Number.  If  an  acceesion  number 
is  known,  its  summary  may  br  called  for  display  by  use  of  the  unique 
action  code  (W. ).  Action  code  (W.)  sets  on  a  single  accession  number. 

It  can  be  used  to  reference  the  summary  of  a  known  work  unit  or  to 
view  an  accession  listed  as  one  of  the  results  of  a  search  request. 
Accessions  called  into  display  by  action  code  (W.)  may  be  separately 
transferred  to  batch  off-line  processing  of  single  work  units  in 
multiple  formats  If  the  user  has  a  list  of  accession  numbers, 
each  number  may  be  requested  individually  by  means  of  the  (W.)  code 
and  transferred,  individually,  to  the  Special  File  where  the  accession 
numbers  may  be  accumulated  and  transferred  to  batch  processing  as 
a  package. 

2.  Auxiliary  Codes 

The  three  auxiliary  codes,  P,  Y,  and  N.  are  used  to  respond 
to  scope  display  messages  concerning  display  or  search  continuation. 


3  Service  Codes 


Service  codes  are  used  to  call  reference  data  displays  such 
as  display  a  and  formats  and  data  field  code  identifications, 
and  to  reference  examples  of  how  to  formulate  search,  sort,  or  batch 

requests 

In  summation,  the  DDC  Remote  On-Line  Retrieval  System  Capabilities 
cover  every  operation  needed  for  information  retrieval  and  display 
A  complete  list  of  systems  capabilities  follows 


Action  Codes 

S  —  implement  I  F.  Search 

SE , - - —  Redisplay  I  F  Search  Statistics 

SO.—————  Redisplay  I  F.  Search  Question 

A,----—-- - Display  Accession  Number  List  From  I  F  Search 

AD—————  Display  WUIS  Items  From  I  F.  Search 

AB  .— — —  Transfer  I  F.  Search  Results  To  Batch  System 
AST,————  Sort  Inverted  File  Search  Results 

Q.— -----  Implement  D.F.  QUAL  Search  Using  I  F  Search  Results 

QF, ............  Redisplay  D.F.  Qualification  Search  Statistics 

qq, ............  Redisplay  D.F  Qualification  Search  Question 

X  — - Display  Accession  Nwber  List  From  D.F  Qual  Search 

XD— — —  Display  WUIS  Items  From  C  F  Qualification  Search 
XB.— — —  Tranafar  O  F.  Qual  Search  Results  To  Batch  System 
XST  — — • r—  Sort  Direct  File  Qualification  Results 


\ 


23 


TZ  -  Build  Special  Accession  Number  File 

Z. - - - Display  Accession  Number  List  From  Special  File 

DZ - Display  WUIS  Items  From  Special  File 

ZB  -  Transfer  Special  File  Results  To  Batch  System 

ZST -  Sort  Special  File  Items 

W. - Display  Single  Knovm  WUIS  Item 

Auxiliary  Codes 

p. - - - For  Paging  Screen  Displays 

Y  -  For  A  Yes  Response  Or  To  Ignore  The  Message  And 

Proceed  With  The  Function 

N  — - - - For  A  No  Response 

Service  Codes 

FD- - Request  CRT  Item  Format  Change /Continuous  Display  Mode 

LL - Display  cite  Activity  Log 

SS  -  Display  Sample  I.F  Search  Pattern 

ST  Display  Sample  Sort  Request 

SQ -  Display  Sample  Qualification  Pattern 

SB  - —  Display  Sample  Batch  Interface  Request 

l  - -  Display  D.F  Flelds/l.F.  Role  Numbers 

y  _ _ Display  CRT  Item  Formats 

C. _  Display  of  Total  ^vsten  Capabilities 

Oft.1 - —  User  '-ul-io  to  Sequential  System  Processing 


Eo  CRITIQUE 


The  DOC  Remote  On-Line  Retrieval  System  has  been  very  successful. 
Ic  allows  the  user  to  interact  with  the  data  base  and  modify  his 
query  based  on  feedback.  It  does  this  quickly,  easily,  and  relatively 
cheaply  in  terms  of  computer  time  costs  With  this  new  system  DDC 
has  overcome  the  basic  weaknesses  of  batch  processing  No  longer 
is  there  a  need  for  intermediaries  who  may  distort  or  misinterpret 
the  user's  intention  The  rigidities  inherent  in  batch  processing, 
which  make  query  modification  difficult,  have  been  eliminated.  The 
time  saved  is  immense,  and  user  confidence  m  the  relevance  of  his 
answer  is  much  improved. 2 

Despite  well  merited  praise,  there  are  flaws  in  the  present  DDC 
On-Line  System.  Two  of  them  are  major;  lack  of  an  on-line  thesaurus 
file,  and  lack  of  eystera  reliability  Of  these  two,  tin  lack  of  an 
on-line  thesaurus  file  is  the  most  Hampering  to  the  user  Despite 
printed  reference  aide,  the  user  must  spend  a  great  deal  of  time 
selecting  appropriate  descriptors  to  use  in  his  search  Without  a 
thesaurus,  hla  effort  to  discover  Che  right  synonym  and  available 
narrower  terns  is  a  hit  or  miss  process  This  process  of  discovery 
can  be  time-consuming,  exasperating,  am  misleading  There  Is  a 
very  reel  need  for  an  on-line  thesaurus  file  to  guide  the  user  to 
the  actual  terms  which  have  been  used  to  index  his  areas  of  interest. 
DDC,  by  its  omission  of  access  to  an  on-line  ti  assuror,  file,  has 
seriously  curtailed  the  system's  utility  for  sub'ect  terra  searches 
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System  reliability,  the  other  major  problem*  is  a  constant  one 
with  any  new  computer  system,  particularly  one  which  has  beer,  subject 
to  as  much  revision  as  has  the  DPC  Remote  On-Line  Retrieval  System 
Even  so,  down -time  shou  not  be  as  much  of  a  dally  hazard  in  the 
life  of  the  user  as  it  has  been  and  continues  to  be  with  respect  to 
both  hardware  and  system  operations, 

Another  area  which  should  be  improved  is  off-line  batch  request 
capabilities  At  present  this  area  of  system  capabilities  is  rigid 
and  limited  Although  18  print  formats  are  available,  seldom  do  any 
of  the  18  formats,  other  than  the  1498M  format,  meet  the  listing 
requirements  of  a  NARDIS  request.,  A  free  form  report  generator 
capability  similar  to  that  available  for  special  on-line  display 
formats  should  be  made  part  of  the  batch  request  capability.  Sorting 
ability,  which  is  now  limited  to  four  fields  for  batch  processing 
and  three  fields  for  on-line  displays,  should  be  expanded  to  a  more 
optimal  number  of  fields  for  either  capability.  And,  finally,  the 
report  generator  should  have  a  summing  capability  for  at  least  three 
fields  in  order  to  generate  summary  report  requests  for  on-line 
display  or  batch  printing. 
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III.  THE  NATIONAL  AERONAUTICS  AND  SPACE  ADMINISTRATION  RECOH  SYSTEM3 

A.  INTRODUCTION 

RECON,  an  acronym  which  stands  for  REmote  CONsole,  is  the  name 
of  the  real-time,  on-line,  time-shared,  information  retrieval  system 
used  by  the  National  Aeronritics  and  Space  Administration* 

RECON  was  started  in  February,  1969,  and  now  has  21  terminals 
installed  at  various  NASA  centers  throughout  the  United  States . 

All  RECON  terminals  are  linked  by  leased  telephone  lines  to  an 
IBM  360,  Model  50,  computer  at  the  NASA  Scientific  and  Technical 
Information  Facility  in  College  Park,  Maryland, 

B.  EQUIPMENT  DESCRIPTION 

Each  RECON  terminal  consists  of  the  following  pieces  of  equipments 
1 »  CRT  Display 

The  present  RECON  CRT  is  a  portable  model  with  an  8  x  12 
inch  display  screen, 

2 ,  Keyboard 

The  keyboard,  also  portable,  is  connected  to  the  CRT 
display  by  electric  lead  wires.  The  separation  of  the  keyboard  from 
the  CRT  display  allows  for  easy  positioning  of  either  unit  to  suit 
the  user’s  needs.  The  keyboard  consists  of  standard  electric  type¬ 
writer  keys,  plus  special  function  keys,  and  cursor  control  and 
editing  keys. 
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3.  Printer 


The  RECON  printer  is  a  standard  Western  Union  teletype 
machine.  Its  printing  speed  is  15  characters  per  second  900 
characters  per  minute.  It  operates  in  a  receive  only  mode  and  prints 
on  operator  or  remote  computer  command.. 

4.  Control  Unit 

The  control  unit  contains  a  data  modem  and  a  CRT  printer 
buffer  unit.  The  data  modem  controls  signal  transmissions  to  and 
from  the  remote  computer.  The  CRT  printer  buffer  unit  controls 
signal  transmission  between  the  CRT  display  and  the  printer. 

5  Response  Time 

With  all  twenty-one  terminals  in  simultaneous  use, 
message  response  time  to  and  from  the  computer  is  usually  ten  to 
twelve  seconds  or  less.  While  RECON  may  claim  priority,  if  necessary, 
for  computer  access,  it  usually  operates  in  a  multiprogram  environ¬ 
ment  which  permits  computer  access  to  other  programming  requests. 

C.  SEARCH  CAPABILITIES 

The  main  data  base  available  to  RECON  users  is  a  bibliographic 
citation  data  bank  describing  the  more  than  600,000  documents  housed 
at  the  NASA  Scientific  and  Technical  Facility  at  College  Park.  Each 
document  citation  includes  catalog  data,  such  as  author,  title,  report 
number,  date,  publisher,  and  project  number,  i.e„,  the  standard  ref¬ 
erence  data  used  in  bibliographic  citations.  Each  document  citation 
also  includes  the  keywords  assigned  by  the  author  of  the  document, 
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the  Index  terms  assigned  by  the  NASA  indexing  staff,  and  special 
identifiers,  assigned  by  the  author  or  indexer.  Textual  data,  such 
as  abstracts  or  report  summaries  similar  to  those  available  on  the 
DDC  On-Line  System,  are  not  included  as  part  of  NASA  bibliographic 
citation  data  base.,  The  purpose  of  the  report  citation  file  is  to 
retrieve  bibliographic  citations  to  literature  which  may  be  obtained 
in  microfiche  form  in  the  libraries  and  offices  where  RECON  terminals 
have  been  installed.  Since  the  reports  themselves  are  easily  avail¬ 
able,  sammaries  and  abstracts  are  not' needed. 

I'*  Data  Base 

The  data  base  comprises  a  linear  file,  an  Inverted  file,  and 
a  thesaurus  file.  Formalized  catalog  da..*  entries  such  as  author, 
dates,  publishers,  project  numbers,  etc.,  are  assigned  identifier 
codes  on  the  inverted  and  linear  files  and  may  be  searched  in  much 
the  same  fashion  as  similar  catalog  data  is  searched  on  the  DDC 
Inverted  and  Direct  Files. 

2.  Subject  Searching 

The  main  difference  between  the  search  capabilities  available 
to  DDC  On-Line  users  and  RECON  users  is  in  their  subject  search  capa¬ 
bility.  The  DDC  vocabulary  available  for  subject  searches  is  based 
on  three  sources:  DDC  thesaurus  authorized  descriptors  assigned  by 
the  DDC  staff;  keywords  assigned  by  resume  authors;  and  special 
identifiers  assigned  by  either  DDC  indexers  or  resume  authors.  The 
same  three  types  of  postings  are  made  for  the  bibliographic  citations 
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on  the  NASA  file.,  The  difference  between  the  two  term  files  is  that 
only  part  of  the  DDC  tern  file  is  selected  from  a  controlled  thesaurus 
vocabulary,  whereas  all  term  postings  (keywords,  identifiers,  and 
index  terms)  on  the  NASA  file  are  selected  from  the  controlled  vocab¬ 
ulary  of  the  NASA  thesaurus. 

It  is  this  factor,  the  thesaurus  controlled  vocabulary,  that 
distinguishes  the  RECON  system  and  gives  its  user  a  great  advantage 
over  the  DDC  On-Line  user.  The  DDC  On-Line  user  may  employ  the 
hierarchial  option  only  for  authorized  thesaurus  descriptors. 
Appropriate  items  posted  to  unauthorized  descriptors  are  not  avail¬ 
able  for  hierarr.hial  retrieval.  In  contrast,  because  of  its  controlled 
vocabulary,  all  descriptors  on  the  RECON  data  file  are  available  for 
hierarchial  searching  regardless  of  their  initial  assignment  source, 
i.e.,  original  author  or  NASA  indexer.  As  a  result,  a  RECON  user 
may  expand  his  search  to  include  hierarchially  narrower  terms  and 
is  assured  of  complete  coverage  of  any  type  of  descriptor  which 
may  have  been  assigned  to  any  citation. 

3.  The  On-Line  Thesaurus  and  Usage  File 

Another  major  feature  of  the  RECON  system  is  its  on-line 
thesaurus  and  usage  file  Unlike  the  DDC  On-Line  user,  the  RECON 
user  does  not  have  to  locate  the  appropriate  subject  terms  for  his 
search  on  a  hit-or-miss  basis.  The  RECON  ucer  merely  enters  the 
term  he  thinks  is  appropriate.  His  tern  is  matched  against  the 
thesaurus  and  the  corresponding  authorized  thesaurus  term  is 
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displayed  along  with  a  count  of  the  number  of  citations  which  have 
been  posted  to  it.  To  determine  tho  nar rover  term-  for 

a  previously  selected  term,  the  RECON  user  enters  an  expand  command, 
and  the  narrower  terms  and  their  usage  counts  are  displayed  below 
the  original  term  and  its  usage  count.  The  user  may  then  either 
discount  the  narrower  terms  as  part  of  his  search  formulation  or 
indicate  his  wish  for  their  inclusion  in  his  search. 

This  elegant  and  efficient  location  and  display  of  appro¬ 
priate  search  terms,  their  usage,  and  available  hierarchical  search 
expansion  capabilities  is  a  major  milestone  in  the  development  of 
on-line  Information  retrieval  systems.  It  represents  a  great 
advantage  available  to  any  system  which  uses  a  well-developed, 
thesaurus-controlled  vocabulary  for  Indexing  and  retrieval. 

4.  Boolean  Connectives  and  Query  Language 

The  Boolean  connectives  used  in  RECON  are  not  words,  but 
a  combination  of  function  keys  and  a  mathematical  formula.  Symbols 
are  used  to  represent  the  Boolean  connectives  between  sets  of  terms 
and  numbers  are  used  to  indicate  subsets  of  terms.  The  answer 
response  is  a  final  subset  number  which  includes  only  those  items 
which  meet  the  parameters  of  the  formula.  ThlB  method  of  Indicating 
search  levels  end  Boolean  connectives  may  be  efficient,  but  it  is 
not  clear  or  easy  to  use  without  added  instruction. 

The  lack  of  language  as  e  means  of  communication  with 
the  computer  is  most  conspicuous  in  query  formulation,  and  is  a 
major  fault  throughout  the  RECON  system  Despite  the  efficiency 
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and  elegance  of  its  retrieval  capabilities!  RE COS'  a  use  of  function 
keys  and  maibera  for  dialog  between  the  user  and  computer  reduces  Its 
communication  ability.  DDC's  on-line  ability  to  recognize  words  in 
place  of  function  keys  is  not  highly  developed,  but  compared  to 
RECON  *  s  lack  of  language,  DDC’s  query  language  represents  a  great 

i  * 

advance  in  on-line  user  communication  language  and  makes  the  DDC 

on-line  system  much  easier  to  use. 

5 ,  Display.  Off- Line  Batch,  and  Sort  Capabilities 

Since  its  data  base  does  not  include  textual  material, 

RECON's  ancillary  display,  sort,  and  batch  capabilities  are  not  as 

% 

varied  as  those  available  to  the  DDC  on-line  system.  It  does, 
however,  have  the  same  capabilities  ee  the  V^C  on-line  system  for 
display,  sort,  and  batch  capabilities  that  do  not  involve  textual 
data.  RECON  also  hai  a  special  file,  as  does  the  DDC  on-line  system, 
for  selected  answer  retention  and  manipulation. 

D.  CRITIQUE 

NASA's  RECON  system  has  been  in  successful  operation  since 
1969.  Originally  limited  to  service  in  the  Washington,  D.C.  area. 

It  now  has  nation-wide  coverage  end  gives  users  who  ere  thousands 
of  miles  apart  simultaneous,  equal,  afflclent,  and  prompt  access 
to  the  NASA  bibliographic  citation  file.  It  was  ona  of  the 
earliest  on-line  information  retrieval  system*  developed  and 
quickly  proved  the  superiority  of  on-line  retrieval  over  query 
batch  processing  procedures. 
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Although  the  DDC  On-Line  System  and  the  NASA  RECON  System  have 
many  similar  search  and  display  capabilities ,  their  data  bases  are 
very  dissimilar  and  the  two  systems  should  not  be  considered  as 
comparable  to  each  other  in  their  retrieval  capabilities. 

The  RF.CON  data  base,  though  very  large,  is  relatively  simply 
structured  In  comparison  to  the  DDC  WUIS  data  base.  Each  WUIS 
accession  requires  approximately  three  times  as  many  data  field 
entries  as  a  RECON  bibliographic  accession.  The  RECON  file  is 
cumulative;  once  a  bibliographic  citation  is  entered  on  the  file, 
no  further  data  manipulation  is-  required.  The  DDC  WUIS  file  is 
also  cumulative,  but  each  accession  and  its  accompanying  hundred 
or  more  data  fields  is  subject  to  constant  revision  and  replacement.. 
The  search  capabilities  for  either  data  base  are  similar,  but  the 
amount  of  retrievable  information  for  each  accession  is  much  greater 
in  the  DDC  System  than  the  NASA  System.  On  the  other  hand,  the  NASA 
data  base  contains  a  vastly  greater  number  of  searchable  accessions 
than  the  DDC  data  base  noth  retrieval  systems  represent  milestone 
achievements  in  on-line  information  retrieval  development. 
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IV.  THE  TECHNICAL  INFORMATION  PROGRAM  (TIP)  OF  THE  LIBRARIES  OF 
THE  MASSACHUSETTS  INSTITUTE  OF  TECHNOLOGY* 

A,  INTRODUCTION 

Because  of  its  prestigious  academic  setting  and  extensive  review 
and  discussion  in  professional  journal  literature,  the  Technical 
Information  Program  (TIP)  of  the  Libraries  of  the  Massachusetts 
Institute  of  Technology  is  probably  the  most  widely  known  remote 
access  on-line  information  retrieval  system,.  Developed  in  1962  as 
part  of  MIT's  Project  MAC  (Multiple  Access,  Time-Shared  Computers).. 
TIP's  original  purpose  was  to  provide  an  on-line  search  and  retrieval 
capability  for  bibliographic  data  from  physics  Journals*  Since  then 
the  Technical  Information  Program  has  been  generalized  to  cover 
multipurpose  applications  of  its  abilities  to  other  types  of  data 
files.  The  Technical  Information  Program  continues  its  original 
purpose  and  at  present  its  data  base  includes  bibliographic  data 
covering  more  than  30,000  articles  from  38  physics  journals,, 

The  journal  holdings  cited  start  with  the  first  issue  of  1965 
for  each  journal  and  the  holdings  are  updated  weekly  The  file 
contains  the  following  information  for  each  article; 
i  -  identification  (journal,  volume,  page) 
t  -  title 
a  -  author(s) 

1  -  author's  institutional  connection 
c  -  citations  and  bibliography 
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The  literature  Is  arranged  by  journal  and  volume.  The  search 
language  all  the  location  and  retrieval  of  a  set  of  article 
citations  that  contain  a  given  item  or  satisfy  a  set  of  conditions, 

MIT's  TIP  differs  from  both  DDC's  Remote  Access  On-Line  System 
and  NASA's  RECON  System  in  that  it  does  not  employ  a  CRT  display 
TIP  also  has  retrieval  capabilities  riot  included  in  the  two 
previously  discussed  systems 
B„  EQUIPMENT  DESCRIPTION 

The  Technical  Information  Program  is  one  among  a  library  of 
programs  available  to  the  approximately  30  users  of  the  Compatible 
Timesharing  System  (CTSS)  on  the  IBM  7090  computer  used  m  M,I  T  's 
Project  MAC  Access  to  the  computer  is  by  telephone  lines  which 
link  the  IBM  Selectric  2741  Consoles  to  the  computer. 

An  IBM  Selectric  2741  Console  is  a  free  standing  unit  with 
three  parts, 

1.  Typewriter 

The  typewriter  is  similar  to  the  standard  IBM  Selectric 
"golf-ball"  typewriter  In  fact,  when  the  keyboard  is  not  in  use 
as  a  remote  console,  the  typewriter  may  be  used  as  a  tegular  type¬ 
writer. 

2.  Control  Unit 

Tha  control  unit  is  located  behind  the  typewriter  and 
contains  Dataphone  switching  equipment  and  some  computer  circuits 
Tha  control  an  t  «iso  contains  s  switch  which  enables  the  user  to 
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use  the  typewriter  either  as  a  remote  ^nsole  keyboard  or  as  a 
regular  typewriter. 

3.  Telephone 

A  telephone  line  to  the  MAC  computer  Is  obtained  by 
dialing  Into  the  system  by  standard  pushbutton  telephone  located 
next  to  the  typewriter  keyboard.  Access  to  a  line  Is  determined 
by  tue  number  of  concurrent  users  of  the  system.  The  maxitoum 
limit  of  concurrent  users  Is  30;  priority  is  assigned  in  advance. 
C.  LANGUAGE 

A  distinctive  and  laudable  feature  of  TIP  la  its  use  of 
English  words  Instead  of  role  or  code  numbers  or  symbols  for  data 
field  Identification.  For  example,  to  find  the  word  "laser" 
within  a  title,  the  user  types  the  command: 

Find  title  laser 

While  this  Is  far  from  being  a  natural  form  of  sentence  structure,, 
it  is  a  great  deal  easier  to  remember  end  fomulate  than  having 
to  refer  to  a  location  code  for  title  data  field  Identification 
or  a  format  formula  for  field  code  and  starch  value  entry. 

Usere  familiar  with  the  TIP  date  file  may  use  abbreviations 
for  commands,  fields,  end  journal  titles.  Commends  and  flsMa 
may  be  abbreviated  to  their  first  letter.  Journal  titles  may  bs 
abbreviated  to  standard  citation  forms,  standardized  six -letter 
acronyms,  or  numeric  codes. 
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D,  SPECIAL  FILE  CAPABILITY 


Another  TIP  feature  is  its  OUTPUT  SAVE  capability.  Each 
TIP  user  is  allotted  a  portion  of  disk  track  storage  sps.ce  which 
he  may  use  to  save  the  results  of  a  search  for  future  referenee 
and  further  searching.  A  saved  file  may  be  searched  in  the  same 
manner  as  the  TIP  library  file,  A  TIP  library  user  may  make  a 
comprehensive  search  of  the  TIP  library  for  a  br^ad  subject  area, 
save  the  search  results  in  his  own  private  file,  and  at  his 
convenience  make  refined  searches  on  various  aspects  of  the 
subject  at  will  He  can  also  save  the  subsearches  as  separate 
subject  files. 

Saved  files  are  not  stored  as  part  of  the  TIP  library  program 
and  car.  be  manipulated  by  executive  commands  outside  of  TIP 
Having  a  subset  of  the  TIP  library  os  his  own  private  file  gives 
the  TIP  user  facilities  for  information  retrieval,  storage,  and 
manipulation  to  be  envied 
E,  DISPLAY  AND  OFF-LINE  PRINTING 

Any  arrangement  of  the  five  fields  available  may  be  designated 
by  the  user  for  on-  or  off-line  printing  Sorting  Is  automatically 
alphabetic  according  to  journal  title,  author,  article  title,  end 
numeric  according  to  journal  volume  and  page  Off-line  printing 
Is  done  outside  the  TIP  program  and  requires  a  store  file  command 
end  a  prinr  request. 
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F.  SEARCH  CAPABILITIES  AND  CRITIQUE 

The  data  base  of  TIP  is  vary  similar  in  structure  and  form 
to  that  of  RECON.  Both  uata  bases  contain  bibliographic  data 
without  text..  Both  have  similar  search  capabilities  for  Boolean 
rouoling  of  search  terms  and  data  fields  ia  various  combinations, 
TIP,  unlike  RECON  or  DOC's  On-Line  System,  does  not  rely  on  human 
processing  for  the  assignment  of  descriptors,  keywords,  or  special 
identifiers.  TIP  does  not  have  a  thesaurus  or  list  of  descriptors. 
Instead  of  terms  assigned  by  indexers,  TIP  relies  on  the  words 
included  in  artlc1  itles  as  a  source  of  index  terns..  It  can 
do  this  because  the  article  titles  are  taken  from  journals  which 
require  their  authoi  to  use  titles  which  fully  reflect  article 
content,  Since  it  cannot  rely  on  an  authorized  list  of  descriptors 
for  spelling  of  prefixes,  suffixes,  or  phrase  arrangement,  TIP 
supplies  its  users  with  six  options  which  they  may  employ  to 
define  the  object  of  a  search  or  a  condition  to  be  satisfied. 

The  types  of  search  options  are  as  follows: 

i 

„  Prefix  Match  -  Everything  beginning  with  the  desired 
prefix  will  be  retrieved, 

.  Exact  Match  -  Only  terms  which  exactly  match  the 
desired  term  will  be  retrieved,, 

.  Exact  Suffix  Match  -  Only  terms  which  end  with  the 
suffix  cited  will  be  retrieved, 
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Masked  Suffix  Match  -  By  indicating  the  number  of 
participial,  adjectival,  or  plural  ending  characters,  a  suffix 
search  may  be  expanded  to  include  many  variations  of  spelling 
and  meaning 

Words  in  Any  Combination  -  All  titles  containing  the 
terms  cited,  regardless  of  the  order  of  their  appearance  within 
a  title,  will  be  retrieved 

Words  in  Exact  Order  -  Only  titles  which  contain  the 
terms  cited  in  the  exact  order  cited  will  be  retrieved 

This  array  of  term  search  capabilities  gives  the  TIP 
system  a  very  comprehensive  and  ingenious  term  retrieval  system 
In  effect,  it  offers  its  user  a  word  root  stem  and  text  search 
capability  without  encumbering  the  system  with  the  elaborate 
and  large  word  stem  dictionary  and  comparison  tables  usually 
required  to  achieve  eitfier  of  these  rare  search  capabilities. 

To  offset  its  lack  of  hierarchical  search  expansion  capability, 
TIP  offers  its  user  a  unique  capability  made  possible  by  the 
formalities  of  its  data  base  and  data  base  source  Although 
TIP's  data  base  does  not  include  index  terms  based  on  a  hier¬ 
archically  structured  thesaurus,  it  does  include  the  references 
cited  in  each  article’s  bibliography  It  is  assumed  that  articles 
on  the  same  subject  will  have  bibliographic  references  in  common 
even  though  their  titles  have  no  words  m  common.  Therefore, 
to  obtain  a  search  depth  and  expansion  equivalent  to  hierarchial 
search  expansion,  TIP  allows  its  user  to  add  to  his  search  response 
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all  articles  whose  references  include  the  same  references  as  the 
article  or  articles  obtained  by  his  title  term  search.  Expanding 
literature  search  depth  by  checking  reference  commonality  is  a 
venerable  information  retrieval  technique,  heavily  used  in  manual 
literature  searches.  It  is  seldom,  however,  employed  in  computer- 
oriented  information  systems  other  than  in  the  field  of  juris¬ 
prudence,  in  which  historical  rele  ance  and  precedence  are  primary 
concerns. 

By  relying  on  the  formalities  inherent  in  its  data  base,  TIP 
has  achieved  a  comprehensive,  efficient,  inexpensive,  and  easv 
to  use  information  retrieval  system.  Unfortunately,  TIP  will 
work  only  for  information  systems  which  share  its  data  base  input 
reliability.  The  context  of  TIP  is  physics,  a  field  of  information 
which  has  a  narrow  and  specific  vocabulary.  In  addition,  the 
physics  profession  closely  governs  its  literature  formalities. 

All  the  journals  included  in  the  TIP  data  bank  require  very 
similar  reference  citation  formats  and  title  reliability.  Few 
professions  have  achieved  the  same  commonality  of  form,  format,  and 
vocabulary  as  physics  has  in  its  journal  literature.  The  rigidity  of 
its  data  base  is  what  enables  TIP  to  achieve  its  ingenious  search 
capabilities.  It  is  this  same  requirement  for  data  base  rigidity 
that  has  prevented  the  universal  adoption  of  TIP  for  computerized 
information  retrieval  in  other  fields  of  professional  literature. 
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V.  SUGGESTIONS  FOR  EXPLOITING  THE  POTENTIAL  OF  ON-LINE  REMOTE 

ACCESS  INFORMATION  RETRIEVAL  AND  DISPLAY  SYSTEMS 

A.  INTRODUCTION 

The  ideal  remote  access  on-line  information  retrieval 
and  display  system  would  optimize  the  feedback  capabilities  of 
user-computer  interactions  to  achieve  the  fullest  possible  use  of 
the  computer  as  an  information  retrieval  and  display  tool.  Current 
on-line  information  retrieval  and  display  systems  are  mainly  on-line 
program  adaptations  of  batch  processing  techniques  and  do  not 
exploit  the  inherent  possibilities  of  on-line  user-computer 
interaction,  unhampered  by  the  rigidities  and  time  lag  of  batch 
processing  techniques. 5 

The  following  paragraphs  describe  some  of  the  main 
characteristics  and  capabilities  which  might  be  included  in  future 
on-line  information  retrieval  and  display  systems. 

B.  BASIC  TUTORIAL  DISPLAY  SEQUENCE5*6 

The  tutorial  sequence  should  provide  enough  background  and 
instruction  to  train  a  user  completely  unfamiliar  with  the  system. 

It  should  explain  the  origin  and  content  of  the  data  base, 
provide  instruction  in  query  formulation,  end  thorou«,nly  describe 
the  system's  capabilities,  language,  end  limitations.  In  addition  to 
reference  displays  of  sample  queries,  date  field  identification 
tables,  and  display  format  models,  the  tutorial  sequence  should 
include  all  the  materiel  now  provided  In  the  form  of  printed 
instruction  manuals.  For  the  new  user  it  should  provide  a 
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computer-aided  instructional  course  in  query  formulation  using  a 
sample  data  base  to  test  the  new  user's  skill  and  understanding  of 
the  system  before  allowing  him  access  to  the  full  data  base.  The 
importance  of  the  tutorial  sequence  cannot  be  overstressed.  It 
provides  the  basis  for  user-computer  interaction  and  by  doing  so 
determines  in  great  part  the  success  of  potential  use  of  the  system. 
A  good  tutorial  sequence  will  create  user  self-confidence,  increase 
the  efficiency  with  which  the  system  is  used,  and  greatly  expand  the 
system  s  marketability. 

C,  SYSTEM  CONTROL  LANGUAGE  AND  PATTERN  RECOGNITION  CAPABILITIES 
Ideally  the  system  user  would  be  unaware  that  the  language  he 
uses  to  communicate  with  the  computer  presents  any  problems  at  all. 
In  fact,  the  user  woul  ’  probably  prefer  oral  communication  as  the 
medium  of  interaction,  and  much  work  is  being  done  at  NSRPC  ..o 
develop  this  capability.*  The  user  has  no  desire  to  learn  a  new 
language  in  order  to  communicate  his  wishes  to  the  computer.  His 
acceptance  of  the  system  is  in  large  part  determined  by  how  little 
he  must  adjust  his  normal  means  of  communication  in  order  to  use 
the  system.  The  more  natural  the  language,  the  greater  will  be  the 
use  and  acceptance  of  the  system.  Ideally,  the  user  should 
be  able  to  address  an  on-line  information  retrieval  system  in  the 
same  way  he  addresses  a  librarian  —  by  a  naturally  formed 


*The  Speech  Recognition  Group  in  the  Computation  and  Mathematics 
Department  of  NSRDC  is  headed  by  Dr.  S.  Berkowitz. 
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question  or  statement,  such  as,  "What  do  you  have  on  Henry  VIII?", 
or  "How  much  money  is  being  spent  on  laser  research?".  The 
computer  would  then  analyze  the  request,  formulate  and  perform 
the  search,  and  present  the  answer.  This  sonnds  like 
crystal-gazing,  yet  with  on-line  capabilities,  it  is  actually 
possible  to  a  surprising  degree,  t^en  at  the  present  stage  of 
computer  pattern  recognition  capability. 

The  system  control  larguage  and  pattern  recognition 
capabilities  employed  in  a  user-computer  interaction  situation 
should  have  the  following  characteristics: 

-  The  system  should  provide  automatic  search  and 
display  formulation  capabilities  based  on  pattern  recognition, 
utilizing  interactive  user  interrogation  to  determine  automatic 
search  and  display  pattern  expansion  cr  limitations.  For 
example,  if  a  user  submits  his  question  in  an  unacceptable  format, 
the  basic  tutorial  sequence  would  automatically  be  brought  in  for 
appropriate  instruction.  The  instruction  would  then  lead  the 
user  to  automatic  query  analysis  search  and  display  programs. 

The  automatic  query  analysis  programs  would  include  a  program  of 
computer  interrogations  to  elicit  from  the  user  the  parameters 
which  he  requires  but  omitted  from  his  initial  request  because 
he  did  not  understand  the  best  manner  of  query  formulation. 

*  For  users  who  wish  to  bypass  the  automatic  query 
and  display  programs,  the  system  should  provide  two  alternate 
vocabularies  for  system  commands  and  data  field  identifications. 
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The  basic  vocabulary  should  be  English  words  which  reflect  command 
actions  or  describe  the  data  fields  covered.  A  recognizably 
abbreviated  version  of  the  basic  vocabulary  should  be  made 
available  as  an  alternate  system  control  language  for  heavy  users. 
The  system  should  be  capable  of  accepting  commands  and  data 
identifications  which  combine  both  vocabulary  forms. 

•  All  query  formulations  should  be  automatically 
analyzed  for  scope  and  logic.  If  the  analysis  determines  that 
the  query  formulation  is  questionable.,  it  will  automatically  call 
in  the  appropriate  level  of  user-interview  and  user-instruction 
programs.  Ambiguities  or  questionable  strategies  would  be 
clarified  by  interactive  explanatory  interview  sequences.  On 
resolution,  the  search  would  continue  in  its  normal  sequence. 

D.  SEARCH  CAPABILITIES1*3*4*3*6*7 

Search  capabilities  should  include  the  standard  options 
used  in  batch  processing,  such  as: 

1.  Exact  Match  -  Only  items  which  exactly  match  the 
desired  item  will  be  retrieved. 

2.  Prefix  Match  -  Every  item  beginning  with  the  desired 
prefix  will  be  retrieved. 

3.  Suffix  Match  -  Every  item  ending  with  the  desired 
suffix  will  be  retrieved. 

4.  Combined  Prefix  and  Suffix  Match  with  Masking  as 
Indicated  -  A  combination  of  prefix  and  suffix  and  masking 
options  allows  the  user  to  ask  for  middle  of  the  term  comparison. 


For  example,  a  search  on  the  term,  'concept',  using  prefix,  suffix, 
anu  character  masking  indicators  in  combination,  will  retrieve  the 
following  terms: 

concept 

concepts 

conceptualization 

conception 

preconcept 

preconception 

preconcepts 

preconceptualization 

misconcept 

misconception 

and  other  possible_p.cg.fix  and  suffix  combinations  involving  the 
basic  word5  ’concept'.  In  effect,  the  combination  of  prefix, 
suffix,  and  character  masking  indicators  offers  much  the  same 
retrieval  capability  as  word  root  stem  search  systems  which 
rely  on  dictionary  files,  and  extensive  prefix  and  suffix 
ending  comparison  tables. 

5.  Hierarchical  Expansion  -  If  desired,  the  search  term 
can  be  compared  against  an  on-line  thesaurus  file  dirolay  and 
the  search  can  be  expanded  to  Include  synonymous,  narrower, 
and/or  related  terms  cited  as  appropriate  by  the  thesaurus 
file.  The  thesaurus  display  can  include  a  usage  count  for  each 

of  the  appropriate  terms  shown.  Hierarchical  exranaion  capability 
is  not  lir ited  to  words.  It  can  also  be  agplied  to  numeric  or 
alphanumeric  classif icetion  systems  which  are  hierarchically 
structured,  such  as  a  hierarchical  decimal  claaslf leaf  ion  system. 

6.  Quality  Comparison  on  All  Range  Variations 


\ 
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7.  Boolean  Connectives  -  AND,  OR,  and  NOT  connectives 
can  be  used  as  search  parameters. 

8.  Limited  Text  Search  -  Any  retrieval  system  wit',  the 
foregoing  search  capabilities  is  also  capable  of  limited  text 
searching,  provided  the  text  data  fields  to  be  searched  are 
arranged  in  a  sequential  format,  so  that  the  text  may  be  searched 
on  a  word  for  word  hasi  ;.  The  value  of  a  limited  text  search 
capability  depends  upon  the  data  base.  Data  bases  which  hare 
one-  or  two-line  text  data  such  as  titles  can  use  text  search 

as  a  fundamental  retrieval  capability.  Data  bases  which  conta  n 
a  great  deal  of  textual  data  may  find  text  searching  too  time- 
consuming  to  merit  incorporation  as  a  search  capability  unless 
confined  to  searching  titles  or  similar  selected  textual  date 
areas . 

In  either  case,  text  retrieval  requires  a  combina¬ 
tion  of  matching  capabilities,  Boolean  connectives,  and  linkage 
indicators.  Linkage  is  required  in  order  to  indicate  word 
position  acceptability.  For  instance,  a  text  search  on  the 
phrase  ' -bin  films'  requires  parameters  to  indicate  which  or 
how  many  of  the  following  three  word  arrangements  was  accept¬ 
able  as  an  answer; 

thin  films 

thin  magnetic  films 

films  of  thin  people 
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Obviously,  the  merit  of  limited  text  searching  is 
dependent  on  the  deductive  reasoning  and  linguistic  orovs. 
of  the  search  analyst.  Backed  by  an  adeouate  on-line  thesaurus, 
however,  and  hierarchical  expansion  o*  term  input,  lk-iled  te  u 
retrieval  can  be  developed  into  a  powerful  ancillary  search 
tool  for  use  with  one-  or  two-line  text  data  bases  such  as 
bibliograph;  data  ban’cs, 

9.  Weighting  -  Weighting  retrieval  :.«*•  ah  Titles  are 
currently  dependent  on  values  assigned  by  indexers  as  part 
of  data  input.  Any  data  base  which  includes  weighting  as  an 
input  value  can  incorporate  weighting  as  a  search  capability. 
Weighting,  and  its  allied  vslue,  linkage,  need  not  be  dependent 
on  man-asr icned  values  but  can  be  determined  by  computer- 
performed  statistical  inference,  an  on-line  capabilitv  w.oieh 
is  discussed  in  the  next  section. 

v.  AUTOMAT  I*  QUERY  FOR  UJLAT  ION  8Y  MEANS  OF  STATISTICAL 

1  TERENCE  ANALYSIS  AND  HEURISTIC  OPTIMIZATION  5,6,8.9,10,11,12 

Statistical  inference  ia  an  on-line  capabilitv  partic¬ 
ularly  appropriate  to  information  retrieval.  Utilizing  statis¬ 
tical  inference  programs  which  can  analyze  in  advance  tire  hit 
probabilities  of  every  noesible  search  strategy,  the  user  could 
cite  known  answers  for  cowvter  anal  vs  U;  and  as  -  the  program 
tc  determine  the  best  scorch  strategv  to  retrieve  documents 
which  contain  simllat  data.  The  degree  of  similarity  r?iuir*.d 
could  be  specified  bv  the  user  on  e  jercenta.??  vasis.  i.e., 
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90%,  80%.  The  procedure  used  for  statistical  contingency  analysis* 
and  percentage  matching  could  also  be  adapted  to  analyze  and  assign 
percentage  weighting  of  input  data.  The  actual  analysis  could  be 
baaed  on  the  number  of  hits  which  a  full  match  of  every  data  field 
would  obtain.  A  succession  of  match ss,  subtracting  one  data  field 
at  a  time,  would  heuristically  derive  the  optimum  search  strategy. 

To  increase  efficiency,  automatic  search  formulation  analysis 
could  be  programmed  to  Ignore  various  data  fields  such  as 
accession  numbers  to  avoid  pointless  comparison.  A  standard 
group  of  comparison  fields  could  be  determined  in  advance  with 
additional  fields  added  as  search  parameters  by  user  feedback. 

User  feedback  acquired  by  computer  interrogation  of  the  user  would 
be  utilized  to  weight  search  parameters,  such  as  main  subject 
concept  and  logistic  data  requirements  which  must  be  met  in  order 
to  satisfy  the  search  request. 

If  two  or  more  known  answers  are  cited  by  the  user,  the 
automatic  search  analysis  program  would  compare  them  for  data  field 
coessonality,  weight  the  search  by  means  ot  user-identified 
essentials,  and  proceed  from  there  to  determine  the  optimum  search 
strategy.  The  same  eeerch  formulation  analysis  program  combined 
with  user  interrogation  programa  could  be  used  to  initiate  search 
atratagies  without  using  known  answers  ss  s  basis  for  starch 
formulation. 

•Statistical  contingency  analysis  rsfars  to  the  statistical 
frsquency  analysis  of  the  occurence  of  items  within  a  given  context 
or  environment. 
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The  key  factors  in  automatic  query  formulation  by  means  of 
statistical  contingency  analysis  and  heuristic  programming 
techniques  are  on-line  user  feedback  capabilities  and  the 
incredibly  ?ast  and  almost  instantaneous  computational  capac¬ 
ities  of  the  computer.  Optimum  se^^ch  strategy  formulation 
by  computer  is  not  achieved  by  deduction,  but  by  trial  and  error, 
or  heuristics,  a  very  inadequate  procedure  for  humans.  When  done 
by  computer,  trial  and  error  computation  is  done  so  massively 
and  instantaneously  that  an  optimum  search  strategy  can  be 
heuristically  obtained  by  computer  in  less  time  than  it  takes 
for  a  human  analyst  to  enter  his  own  search  strategy  into  the 
on-line  retrieval  system.  Automatic  search  formulation 
based  on  on-line  user  feedback,  statistical  contingency  analysis, 
and  heuristic  optimization  procedures  will  undoubtedly  ’evelop 
into  the  primary  search  fcrmulat'on  method  for  all  future  on-line 
information  systems  regardless  of  their  data  base  content.  The 
concept  involved  is  extraordinarily  practical  in  terms  of 
potential  use  and  personnel  cost  reduction,  r.ventually ,  a 
generalized  query  formulation  program  will  be  developed  and  made 
adaptable  to  every  kind  of  data  base.  Query  coordinators  and 
search  analysts  may  prove  redundant. 


F.  DISPLAY  CAPABILITIES 


Ideally,  display  capabilities  should  be  the  same  for  all 
output  media  and  should  be  able  to  meet  every  foreseeable  user 
requirement.  To  accomplish  this  goal,  the  ideal  sytem  would 
utilize  a  universal  report  generator  capable  of  displaying, 
sorting,  formating,  and  summing  as  required  any  data  supplied 
to  it,  regardless  of  the  data  source.5  In  addition  to  being 
capabxe  of  accepting  any  design  requirement,  the  report 
generator  should  be  able  to  supply  standard  displays  and  report 
formats  pertinent  to  search  formulation,  search  result  statistics, 
and  search  review  or  document  browsing. 

Search  result  statistics  displays  should  be  in  the  form  of 
a  matrix  showing  hits  per  search  parameter  and  should  be  ranked 
according  to  number  of  parameters  met.  In  conjunction  with  this 
display  the  user  should  be  able  to  request  accession  displays  for 
each  rank  of  parameters  met  by  specifying  which  rank  of  answer 
sets  he  wishes  to  see. 

On-line  print  requests  should  be  capable  of  by-passing  the 
CRT  display  prior  cc  p»-,>1,'ing.  the  user  shoulu  be  aole  utud 
a  CRT  display  sequence  at  will  and  should  also  be  able  to  have 
any  portion  of  a  CRT  display  printed  at  will.  He  should  be  able 
to  alter  display  data  prior  to  a  print  request  and  have  the  display 
printed  as  altered. 
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Print  or  display  requests  involving  search  result  displays 
should  be  capable  of  accepting  user  specified  headings  and  data 
field  labels  in  place  of  standard  headings  or  labels.  User 
designed  formats  should  automatically  display  a  sample  formai 
based  on  the  user’s  design  and  should  be  capable  of  accepting 
format  design  revisions  by  the  user. 

The  universal  report  generator  should  also  be  able  to  supply 
standard  graphic  displays  of  statistical  data.  Graphics 
capabilities  should  include  color  choice;  graph  superimposition; 
and  a  choice  of  graph  type,  e.g.,  axial,  bar,  or  pie  chart. 

G.  SPECIAL  FILES 

1.  Temporary  Storage  File  -  A  temporary  storage  file 
should  be  available  for  selected  answer  transfer  and  storage. 
Selected  answers  stored  on  the  temporary  storage  file  should 
have  the  same  search  and  display  capabilities  as  the  main  data 
base. 

2.  Query  Library  File  Permanent  storage  space  should 
be  al  otted  for  a  Query  Library  File.  The  Query  Library  File 
should  be  searchable  and  capable  of  displaying  its  data  of  query 
formulations,  search  statistics,  and  report  generator  formats. 5 
Appropriate  data  would  be  automatically  generated  and  transferred 
to  the  Query  Library  File  whenever  a  query  is  addressed  to  the 
main  data  base.  The  Query  Library  File  would  be  used  to  maintain 
statistics  on  system  use  and  to  store  formulations  and  formats 
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-;d  for  recurrent  queries  such  as  standard  recurrent  subject 
bibliograptHes  or  user  interest  profiles.  Whenever  its  allotted 

space  nears  capacity,  the  Query  Library  File  should  automatically* 
submit  a  self-printout  for  review  and  selective  data  re-storage. 

H.  EQUIPMENT  REQUIREMENTS 

The  type  and  extent  of  remote  terminal  equipment  depends  upon 
the  user's  needs.  Casual  use  would  not  require  a  CRT  or  high 
speed  printer  and,  in  many  cases,  the  only  required  equipment 
would  be  a  typewriter  console  and  data  modem. 

Heavy  and  co^nlex  usage  would  require  the  following 
equipment  triad: 

CRT  Display 

Typewriter  Keyboard  and  auxiliary  input  devices 

High  Speed  Printer 

1.  CRT  Display  -  The  ideal  CRT  display  would  include  a  full 
page  of  text,  displayed  clearly  and  legibly,  and  requiring  no 
eye  strain.  Display  speed  should  be  user  governable  to  accommodate 
various  reading  speeds  and  data  comprehension  rates.  A13  display 
i  ;.ttv  ’  s  should  be  located  on  the  front  of  the  display  ''.nd  be 
easily  accessible.  The  display  equipment  should  include  a  projection 
device  for  wall  screen  enlargement  of  CRT  displays  for  large  audience 
viewing.  The  CRT  unit  should  be  as  compact  and  light  as  possible 
and  easily  moved. 
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2.  Keyboard  and  Display  Control  -  The  keyboard  should  contain 
the  standard  electric  keyboard  with  cursor  and  display  control 
keys  located  on  the  user's  right  for  easy  access  and  use.  If  the 
terminal  is  to  be  used  for  remote  batch  in^ut,  the  keyboard  should 
accept  auxiliary  input  devices  such  as  magnetic  disk  or  cassette 
tapes,  punched  cards,  data  phones,  etc. 

3.  Printer  -  The  printer  should  be  a  silent,  high-spepd, 
no;.-impact  device  capable  of  producing  alphanumeric  or  graphical 
printouts  in  single  or  multiple  copies  as  needed. 

I.  SUMMARY 

On-line  interaction  between  the  user  and  the  computer  opens 
the  door  for  the  use  of  the  immense  computational  capabilities  of 
the  computer  as  a  tool  for  information  retrieval  methods  based 
on  statistical  analysis  of  user  feedback.  User-computer  on-line 
interaction  also  offers  possibilities  for  display  editing  and 
data  manipulation  far  beyond  current  program  limitations  of 
on-line  display  and  report  generators.  Specifically,  the  ideal 
on-line  information  retrieval  and  display  system  should  include 
all  the  capaM titles  of  batch  processing  enhanced  by  the 
directive  and  interaction  capabilities  of  on-line  systems  for 
user-computer  information  exchange. 
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